光有许多可以通过视觉传感器被动测量的特性。色带分离波长和强度可以说是单眼6D对象姿态估计的最常用的波长。本文探讨了互补偏振信息的互补信息,即光波振荡的方向,可以影响姿态预测的准确性。一种混合模型,利用数据驱动的学习策略共同利用物理代理,并在具有不同量的光度复杂度的物体上进行设计和仔细测试。我们的设计不仅显着提高了与光度 - 最先进的方法相关的姿态精度,而且还使对象姿势估计用于高反射性和透明的物体。
translated by 谷歌翻译
In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs allows SLU systems to improve in comparison to the 1-best setup (4% relative improvement). However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, and a relative improvement of 18% over the 1-best configuration. Thus, crossmodal architectures represent a good alternative to overcome the limitations of working purely automatically generated textual data.
translated by 谷歌翻译
This paper describes a simple yet efficient repetition-based modular system for speeding up air-traffic controllers (ATCos) training. E.g., a human pilot is still required in EUROCONTROL's ESCAPE lite simulator (see https://www.eurocontrol.int/simulator/escape) during ATCo training. However, this need can be substituted by an automatic system that could act as a pilot. In this paper, we aim to develop and integrate a pseudo-pilot agent into the ATCo training pipeline by merging diverse artificial intelligence (AI) powered modules. The system understands the voice communications issued by the ATCo, and, in turn, it generates a spoken prompt that follows the pilot's phraseology to the initial communication. Our system mainly relies on open-source AI tools and air traffic control (ATC) databases, thus, proving its simplicity and ease of replicability. The overall pipeline is composed of the following: (1) a submodule that receives and pre-processes the input stream of raw audio, (2) an automatic speech recognition (ASR) system that transforms audio into a sequence of words; (3) a high-level ATC-related entity parser, which extracts relevant information from the communication, i.e., callsigns and commands, and finally, (4) a speech synthesizer submodule that generates responses based on the high-level ATC entities previously extracted. Overall, we show that this system could pave the way toward developing a real proof-of-concept pseudo-pilot system. Hence, speeding up the training of ATCos while drastically reducing its overall cost.
translated by 谷歌翻译
Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.
translated by 谷歌翻译
Automatic Speech Recognition (ASR) for air traffic control is generally trained by pooling Air Traffic Controller (ATCO) and pilot data into one set. This is motivated by the fact that pilot's voice communications are more scarce than ATCOs. Due to this data imbalance and other reasons (e.g., varying acoustic conditions), the speech from ATCOs is usually recognized more accurately than from pilots. Automatically identifying the speaker roles is a challenging task, especially in the case of the noisy voice recordings collected using Very High Frequency (VHF) receivers or due to the unavailability of the push-to-talk (PTT) signal, i.e., both audio channels are mixed. In this work, we propose to (1) automatically segment the ATCO and pilot data based on an intuitive approach exploiting ASR transcripts and (2) subsequently consider an automatic recognition of ATCOs' and pilots' voice as two separate tasks. Our work is performed on VHF audio data with high noise levels, i.e., signal-to-noise (SNR) ratios below 15 dB, as this data is recognized to be helpful for various speech-based machine-learning tasks. Specifically, for the speaker role identification task, the module is represented by a simple yet efficient knowledge-based system exploiting a grammar defined by the International Civil Aviation Organization (ICAO). The system accepts text as the input, either manually verified annotations or automatically generated transcripts. The developed approach provides an average accuracy in speaker role identification of about 83%. Finally, we show that training an acoustic model for ASR tasks separately (i.e., separate models for ATCOs and pilots) or using a multitask approach is well suited for the noisy data and outperforms the traditional ASR system where all data is pooled together.
translated by 谷歌翻译
已经提出了几个持续学习技术的家庭,以减轻非静止数据深度神经网络训练的灾难性干扰。但是,由于合适数据集的可接触是可接近的,全面的比较和分析仍然很大程度上是开放的。实证检查不仅在个体作品之间变化而变化,它进一步依赖于通过各种普遍的静态视觉数据集的细分和连接来实现基准的成分。在这项工作中,我们的目标是通过引入计算机图形仿真框架来弥合这一差距,这在无尽的实时程序世界生成过程中重复越来越多的城市场景碎片。其核心在于具有适应性生成因子的模块化参数生成模型。后者可用于灵活地构图数据流,这显着促进了详细的分析,并允许轻松调查各种连续学习计划。
translated by 谷歌翻译
尽管随着时间的推移,已经引入了大量的深层分类建筑变体,但最近的作品发现了培训过程中相似之处的经验证据。据推测,神经网络不仅融合了类似的表示,而且还表现出了关于首先学习数据实例的经验一致性的概念。在后者的作品$'$脚步之后,我们定义了一个度量标准,以量化随着时间的推移这种分类协议之间的关系,并认为该协议现象可以映射到研究数据集的核心统计数据。我们从经验上证实了整个CIFAR10,Pascal,ImageNet和KTH-TIPS2数据集的这一假设。我们的发现表明,同意似乎独立于特定体系结构,培训超参数或标签,尽管根据图像统计数据遵循订购。
translated by 谷歌翻译
驾驶是许多常规活动,但它远非简单。司机处理多个并发任务,例如将车辆保持在车道中,观察和预测其他道路使用者的行为,对危险作出反应,并在车内和外部处理分类。未能注意到并响应周围的对象和事件可能导致事故。道路基础设施和车辆机械设计的持续改进使得驾驶更安全。尽管如此,司机疏忽的问题仍然是事故的主要原因之一。因此,了解司机的外观以及为什么他们这样做可以帮助消除分心的来源并识别不安全的注意模式。驾驶员注意力对许多实际应用有影响,如政策制定,改进驾驶员教育,加强道路基础设施和车载信息娱乐系统,以及驾驶员监控,驾驶员辅助和自动驾驶的设计系统。本报告涵盖了驾驶员视觉关注分布因因素,内部和司机外部的变化的文献。在驾驶期间关注的各个关注已经探讨了多个学科,包括心理学,人类因素,人机互动,智能运输和计算机视觉,每个人都提供了不同的观点,目标和对观察到的现象的解释。我们将司机注意力对实际解决方案的关注交叉学科理论和行为研究。此外,讨论了未来研究的限制和方向。本报告基于超过175个行为研究,近100个实用文件,20个数据集,自2010年以来发布的70多个调查。本报告中使用的策划文件列表可用于\ URL {https://github.com/ykotseruba /注意力_driving}。
translated by 谷歌翻译